Features for automatic discourse analysis of paragraphs

نویسندگان

  • Daphne Theijssen
  • Hans van Halteren
  • Suzan Verberne
چکیده

In this paper, we investigate which information is useful for the detection of rhetorical (RST) relations between (Multi-) Sentential Discourse Units ((M -)SDUs)-text spans consisting of one or more sentences-within the same paragraph. In order to do so, we simplified the task of discourse parsing to a decision problem in which we decided whether an (M-)SDU is either rhetorically related to a preceding or a following (M-)SDU. Employing the RST Treebank (Carlson et al. 2003), we offered this choice to machine learning algorithms to­ gether with syntactic, lexical, referential, discourse and surface features. Next, the features were ranked on the basis of (1) models established by the classification algorithms and (2) feature selection metrics. Highly ranked features that predict the presence of a rhetorical relation are syntactic similarity, word overlap, word similarity, continuous punctuation and many reference features. Other features are used to introduce new topics or arguments: time references, proper nouns, definite articles and the word further.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language Complexity, Accuracy and Fluency in Different Types of Writing Paragraph: Do the Raters Notice Such Effect

The aim of the present study was to investigate the effects of two types of paragraph on EFL learners’ written production. It addressed the issue of how three aspects of language production (i.e. complexity, accuracy, and fluency) vary among two types of paragraphs (i.e. paragraphs of chronology and cause-effect) written by EFL learners. Thirty intermediate level learners of English participate...

متن کامل

Automatic Paragraph Segmentation with Lexical and Prosodic Features

As long-form spoken documents become more ubiquitous in everyday life, so does the need for automatic discourse segmentation in spoken language processing tasks. Although previous work has focused on broad topic segmentation, detection of finer-grained discourse units, such as paragraphs, is highly desirable for presenting and analyzing spoken content. To better understand how different aspects...

متن کامل

A Critical Discourse Analysis of the Event of September 11, 2001 in American and Syrian Print Media Discourse

Aiming at highlighting the important role of print media discourse in the implicit  transfer of  the dominant ideology of discourse context, the present data-driven paper demonstrates  how the lexical features of repetition and synonymy as well as the structural and thematic features of passivization, nominalization and predicated theme were utilized by the discourse producers to mediate betwee...

متن کامل

Automatic Prostate Cancer Segmentation Using Kinetic Analysis in Dynamic Contrast-Enhanced MRI

Background: Dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) provides functional information on the microcirculation in tissues by analyzing the enhancement kinetics which can be used as biomarkers for prostate lesions detection and characterization.Objective: The purpose of this study is to investigate spatiotemporal patterns of tumors by extracting semi-quantitative as well as w...

متن کامل

Free Model of Sentence Classifier for Automatic Extraction of Topic Sentences

This research employs free model that uses only sentential features without paragraph context to extract topic sentences of a paragraph. For finding optimal combination of features, corpus-based classification is used for constructing a sentence classifier as the model. The sentence classifier is trained by using Support Vector Machine (SVM). The experiment shows that position and meta-discours...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008